Methods and architecture for indexing and editing compressed video over the world wide web

ABSTRACT

A system and method is provided for editing and parsing compressed digital information. The compressed digital information may include visual information which is edited and parsed in the compressed domain. In a preferred embodiment, the present invention provides a method for detecting moving objects in a compressed digital bitstream which represents a sequence of fields or frames of video information for one or more captured scenes of video.

NOTICE OF GOVERNMENT RIGHTS

The U.S. Government has certain rights in this invention pursuant to theterms of the National Science Foundation CAREER award IRI-9501266.

BACKGROUND OF THE INVENTION

I. Field of the Invention

The present invention relates to techniques for editing and parsingcompressed digital information, and more specifically, to editing andparsing visual information in the compressed domain.

II. Description of the Related Art

With the increasing use of local area, wide area and global networks tospread information, digital video has become an essential component ofmany new media applications. The inclusion of video in an applicationoften gives the application not only increased functional utility, butalso an aesthetic appeal that cannot be obtained by text or audioinformation alone. However, while digital video greatly increases ourability to share information, it demands special technical support inprocessing, communication, and storage.

In order to reduce bandwidth requirements to manageable levels, videoinformation is generally transmitted between systems in the digitalenvironment the form of compressed bitstreams that are in a standardformat, e.g., Motion JPEG, MPEG-1, MPEG-2, H.261 or H.263. In thesecompressed formats, the Discrete Cosine Transform (“DCT”) is utilized inorder to transform N×N blocks of pixel data, where n typically is set toeight, into the DCT domain where quantization is more readily performed.Run-length encoding and entropy coding (i.e., Huffman coding orarithmetic coding) are applied to the quantized bitstream to produce acompressed bitstream which has a significantly reduced bit rate than theoriginal uncompressed source signal. The process is assisted byadditional side information, in the form of motion vectors, which areused to construct frame or field-based predictions from neighboringframes or fields by taking into account the inter-frame or inter-fieldmotion that is typically present.

In order to be usable by a receiving system, such coded bitstreams mustbe both parsed and decoded. For example, in the case of an MPEG-2encoded bitstream, the bitstream must be parsed into slices andmacroblocks before the information contained in the bitstream is usableby an MPEG-2 decoder. Parsed bitstream information may be used directlyby an MPEG-2 decoder to reconstruct the original visual information, ormay be subjected to further processing.

In the case of compressed digital video, further processing of videoinformation can occur either in the normal, uncompressed domain or inthe compressed domain. Indeed, there have been numerous attempts byothers in the field to realize useful techniques for indexing andmanipulating digital video information in both the uncompressed andcompressed domains.

For example, in the article by S. W. Smoliar et al., “Content-BasedVideo Indexing and Retrieval,” IEEE Multimedia, summer 1994, pp. 62-72,a color histogram comparison technique is proposed to detect scene cutsin the spatial (uncompressed) domain. In the article by B. Shahraray,“Scene Change Detection and Content-Based Sampling of Video Sequences,”SPIE Conf. Digital Image Compression: Algorithms and Technologies 1995,Vol. 2419, a block-based match and motion estimation algorithm ispresented.

For compressed video information, the article by F. Arman et al., “ImageProcessing on Compressed Data for Large Video Databases,” Proceedings ofACM Multimedia '93, June 1993, pp. 267-272, proposes a technique fordetecting scene cuts in JPEG compressed images by comparing the DCTcoefficients of selected blocks from each frame. Likewise, the articleby J. Meng et al., “Scene Change Detection in a MPEG Compressed VideoSequence,” IS&T/SPIE Symposium Proceedings, Vol. 2419, February 1995,San Jose, Calif., provides a methodology for the detection of directscene cuts based on the distribution of motion vectors, and a techniquefor the location of transitional scene cuts based on DCT DCcoefficients. Algorithms disclosed in the article by M. M. Yeung, et al.“Video Browsing using Clustering and Scene Transitions on CompressedSequences,” IS&T/SPIE Symposium Proceedings, February 1995, San Jose,Calif. Vol. 2417, pp. 399-413, enable the browsing of video shots afterscene cuts are located. However, the Smoliar et al., Shahraray, andArman et al. references are limited to scene change detection, and theMeng et al. and Yeung et al. references do not provide any functions forediting compressed video.

Others in the field have attempted to address problems associated withcamera operation and moving objects in a video sequence. For example, inthe spatial domain, H. S. Sawhney, et al., “Model-Based 2D & 3D DominantMotion Estimation for Mosaicking and Video Representation,” Proc. FifthInt'l conf. Computer Vision, Los Alamitos, Calif., 1995, pp. 583-390,proposes to find parameters of an affine matrix and to construct amosaic image from a sequence of video images. In similar vain, the workby A. Nagasaka et al., “Automatic Video Indexing and Full-Video Searchfor Object Appearances,” in E. Knuth and L. M. Wegner, editors, VideoDatabase Systems, II, Elsevier Science Publishers B. V., North-Holland,1992, pp. 113-127, proposes searching for object appearances and usingthem in a video indexing technique.

In the compressed domain, the detection of certain camera operations,e.g., zoom and pan, based on motion vectors have been proposed in bothA. Akutsu et al., “Video Indexing Using Motion Vectors,” SPIE VisualCommunications and Image Processing 1992, Vol. 1818, pp. 1522-1530, andY. T. Tse et al., “Global Zoom/Pan Estimation and Compensation For VideoCompression” Proceedings of ICASSP 1991, pp. 2725-2728. In theseproposed techniques, simple three parameter models are employed whichrequire two assumptions, i.e., that camera panning is slow and focallength is long. However, such restrictions make the algorithms notsuitable for general video processing.

There have also been attempts to develop techniques aimed specificallyat digital video indexing. For example, in the aforementioned Smoliar etal. article, the authors propose using finite state models in order toparse and retrieve specific domain video, such as news video. Likewise,in A. Hampapur, et al., “Feature Based Digital Video Indexing,” IFIP2.6Visual Database Systems, III, Switzerland, March, 95, a feature basedvideo indexing scheme using low level machine derivable indices to mapinto the set of application specific video indices is presented.

One attempt to enable users to manilupate image and video informationwas proposed by J. Swartz, et al., “A Resolution Independent VideoLanguage,” Proceedings of ACM Multimedia '95, pp. 179-188, as aresolution independent video language (Rivl). However, although Rivluses group of pictures (GOPs) level direct copying whenever possible for“cut and paste” operations on MPEG video, it does not use operations inthe compressed domain at frame and macroblock levels for special effectsediting. Instead, most video effects in Rivl are done by decoding eachframe into the pixel domain and then applying image library routines.

The techniques proposed by Swartz et al. and others which rely onperforming some or all video data manipulation functions in theuncompressed domain do not provide a useful, truly comprehensivetechnique for indexing and manipulating digital video. As explained inS.-F. Chang, “Compressed-Domain Techniques for Image/Video Indexing andManipulation,” IEEE Intern. Conf. on Image Processing, ICIP 95, SpecialSession on Digital Image/Video Libraries and Video-on-demand, October1995, Washington D.C., the disclosure of which is incorporated byreference herein, the compressed-domain approach offers several powerfulbenefits.

First, implementation of the same manipulation algorithms in thecompressed domain is much cheaper than that in the uncompressed domainbecause the data rate is highly reduced in the compressed domain (e.g.,a typical 20:1 to 50:1 compression ratio for MPEG). Second, given mostexisting images and videos stored in the compressed form, specificmanipulation algorithms can be applied to the compressed streams withoutfull decoding of the compressed images/videos. In addition, because thatfull decoding and re-encoding of video are not necessary, manipulatingvideo in the compressed domain avoids the extra quality degradationinherent in the reencoding process. Thus, as further explained in thearticle by the present inventors, J. Meng and S.-F. Chang, “Tools forCompressed-Domain Video Indexing and Editing,” SPIE Conference onStorage and Retrieval for Image and Video Database, Vol. 2670, San Jose,Calif., February 1996, the disclosure of which is incorporated byreference herein, for MPEG compressed video editing, speed performancecan be improved by more than 60 times and the video quality can beimproved by about 3-4 dB if a compressed-domain approach is used ratherthan a traditional decode-edit-reencode approach.

A truly comprehensive technique for indexing and manipulating digitalvideo must meet two requirements. First, the technique must provide forkey content browsing and searching, in order to permit users toefficiently browse through or search for key content of the videowithout full decoding and viewing the entire video stream. In thisconnection, “key content” refers to key frames in video sequences,prominent video objects and their associated visual features (motion,shape, color, and trajectory), or special reconstructed video models forrepresenting video content in a video scene. Second, the technique mustallow for video editing directly in the compressed domain to allow usersto manipulate an specific object of interest in the video stream withouthaving to fully decode the video. For example, the technique shouldpermit a user to cut and paste any arbitrary segment from an existingvideo stream to produce a new video stream which conforms to the validcompression format.

Unfortunately, none of the prior art techniques available at present areable to meet these requirements. Thus, the prior art techniques fail topermit users who want to manipulate compressed digital video informationwith the necessary tools to extract a rich set of visual featuresassociated with visual scenes and individual objects directly fromcompressed video so as not only to enable content based query searches,but also to allow for integration with domain knowledge for derivationof higher-level semantics.

SUMMARY OF THE INVENTION

An object of the present invention is to provide comprehensivetechniques for indexing and manipulating digital video in the compresseddomain.

Another object of the present invention is to provide techniques for keycontent browsing and searching of compressed digital video withoutdecoding and viewing the entire video stream.

A further object of the present invention is to provide techniques whichallow for video editing directly in the compressed domain

A still further object of the present invention is to provide tools thatpermit users who want to manipulate compressed digital video informationto extract a rich set of visual features associated with visual scenesand individual objects directly from compressed video.

Yet another object of this invention is to provide an architecture whichpermits users to manipulate compressed video information over adistributed network, such as the Internet.

In order to meet these and other objects which will become apparent withreference to further disclosure set forth below, the present inventionprovides a method for detecting moving video objects in a compresseddigital bitstream which represents a sequence of fields or frames ofvideo information for one or more previously captured scenes of video.The described method advantageously provides for analyzing a compressedbistream to locate scene cuts so that at least one sequence of fields orframes of video information which represents a single video scene isdetermined. The method also provides for estimating one or moreoperating parameters of a camera which initially captured the videoscene, by analyzing a portion of the compressed bitstream whichcorresponds to the video scene, and for detecting one or more movingvideo objects represented in the compressed bitstream by applying globalmotion compensation with the estimated camera operating parameters.

In a preferred process, the compressed bitstream is a bitstreamcompressed in accordance with the MPEG-1, MPEG-2, H261, or H263 videostandard. In this preferred embodiment, analyzing can beneficially beaccomplished by parsing the compressed bitstream into blocks of videoinformation and associated motion vector information for each field orframe of video information which comprises the determined sequence offields or frames of video information representative of said singlescene, performing inverse motion compensation on each of the parsedblocks of video information to derive discrete cosign transformcoefficients for each of the parsed blocks of video information,counting the motion vector information associated with each of theparsed blocks of video information, and determining from the countedmotion vector information and discrete cosign transform coefficientinformation whether one of the scene cuts has occurred.

In an alternative embodiment, analyzing is performed by parsing thecompressed bitstream into blocks of video information and associatedmotion vector information for each field or frame of video informationwhich comprises the determined sequence of fields or frames of videoinformation representative of the scene, and estimating is executed byapproximating any zoom and any pan of the camera by determining amulti-parameter transform model applied to the parsed motion vectorinformation. In an especially preferred process, the frame differencedue to camera pan and zoom motion may be modeled by a six-parameteramine transform which represents the global motion informationrepresentative of the zoom and pan of the camera.

The detecting step advantageously provides for computing local objectmotion for one or more moving video objects based on the global motioninformation and on one or more motion vectors which correspond to theone or more moving video objects. In addition, thresholding andmorphological operations are preferably applied to the determined localobject motion values to eliminate any erroneously sensed moving objects.Further, border points of the detected moving objects are determined togenerate a bounding box for the detected moving object.

The present invention also provides for an apparatus for detectingmoving video objects in a compressed digital bitstream which representsa sequence of fields or frames of video information for one or morepreviously captured scenes of video. Usefully, the apparatus includesmeans for analyzing the compressed bistream to locate scene cuts thereinand to determine at least one sequence of fields or frames of videoinformation which represents a single video scene, means for estimatingone or more operating parameters for a camera which initially viewed thevideo scene by analyzing a portion of the compressed bitstream whichcorresponds to the video scene, and means for detecting one or moremoving video objects represented in the compressed bitstream by applyingglobal motion compensation to the estimated operating parameters.

A different aspect of the present invention provides techniques fordissolving an incoming scene of video information which comprises asequence of fields or frame of compressed video information to anoutgoing scene of video information which comprises a sequence of fieldsor frame of compressed video information. This technique advantageouslyprovides for applying DCT domain motion compensation to obtain DCTcoefficients for all blocks of video information which make up a lastframe of the outgoing video scene and the first frame of the incomingvideo scene, and for creating a frame in the dissolve region frame fromthe DCT coefficients of the last outgoing frame and the first incomingframe.

In an especially preferred arrangement, an initial value for a weightingfunction is selected prior to the creation of a first frame in thedissolve region and is used in the creation of the first frame in thedissolve region. The weighting value is then incremented, and a seconddissolve frame from the DCT coefficients is generated.

In yet another aspect of the present invention, a technique for maskinga compressed frame of digital video information is provided. Thetechnique first determines whether the frame to be masked isintra-coded, predictive-coded or bi-directionally predictive-coded. Ifthe frame is intra-coded, the technique provides for extracting DCTcoefficients for all blocks within the frame, examining a block, todetermine where in the frame the block is located, setting DCTcoefficients for the block to zero if the block is outside the maskregion, and applying a DCT cropping algorithm to the DCT coefficients ifthe block is on the boundary of the mask region.

If the frame is predictive-coded or bi-directionally predictive-coded,the technique provides for examining motion vectors associated withblock, to determine whether they point to blocks outside or on the maskregion, and reencoding the block if a motion vector points to blocksoutside the boundary, or on, the mask region.

In still another aspect of the present invention, a technique forgenerating a frozen frame of video information from a sequence of framesof compressed video information is provided. The technique attractivelyprovides for selecting a frame of compressed video information to befrozen, determining whether the frame to be frozen is intra-coded,predictive-coded or bi-directionally predictive-coded, and if the frameis not intra-coded, converting it to become intra-coded, creatingduplicate predictive-coded frames, and arranging the intra-coded frameand the duplicate predictive-coded frames into a sequence of compressedframes of video information.

In yet a further aspect of the present invention, a system for editingcompressed video information over a distributed network is provided. Thesystem includes a client computer, a network link for permitting saidclient computer to search for and locate compressed video information onsaid distributed network, and tools for editing a compressed bitstreamof video information over the distributed network.

The accompanying drawings, which are incorporated and constitute part ofthis disclosure, illustrate a preferred embodiment of the invention andserve to explain the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a system in accordance with one aspect of the presentinvention;

FIG. 2. is a flowchart which illustrates how a scene cut is detected inaccordance with one aspect of the present invention;

FIG. 3. is a flowchart which illustrates how camera parameters areestimated in accordance with one aspect of the present invention;

FIG. 4 is a vector diagram which serves to explain global and localmotion;

FIG. 5 depicts an exemplary frame of compressed video information andmotion vectors for the frame;

FIG. 6 is a flowchart which illustrates global motion compensation inaccordance with one aspect of the present invention;

FIG. 7 depicts prior art editing of compressed video bistreams;

FIG. 8 a depicts the dissolve effect; FIG. 8( b) is a flowchart whichillustrates the dissolve effect;

FIG. 9 a depicts masking; FIG. 9( b) is a flowchart which illustratesmasking;

FIG. 10 is a flowchart which illustrates the freezeframe effect;

FIG. 11 depicts two alternative techniques for the slow motion effect;

FIG. 12 is a system diagram of a distributed network in accordance withone aspect of the present invention; and

FIG. 13 depicts exemplarily techniques which may be executed in thedistributed network illustrated in FIG. 12.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring to FIG. 1, an exemplary embodiment of our invention whichpermits a user to edit and parse visual information in the compresseddomain is provided. The architecture of the system 100 is broadlyarranged into three functional modules, a parsing module 110, avisualization module 120, and an authoring and editing module 130.

In the parsing module, an incoming bitstream of compressed videoinformation 111 which may be, for example, an MPEG-2 compressedbitstream is examined for scene cuts 112 and broken into shot segments,where each segment includes one or more fields or frames of compressedvideo information. In an MPEG-2 bitstream, the shot segments will bemade of three types of fields or frames, i.e., Intra-coded (“I”) fieldswhich are coded independently and entirely without reference to otherfields, Predictive-coded (“P”) fields which are coded with reference totemporally preceding I or P fields in the sequence, and Bi-directionallypredictive-coded (“B”) fields which are coded with reference to thenearest preceding and/or future I or P fields in the sequence. Thebistream will also include associated motion vector information for theP and B fields which “point” to blocks of video information in apreceding or succeeding field or frame which are needed to reconstructthe compressed field or frame of video information.

The parsing module compiles a list of scene cuts which are useful forindexing the compressed video information. The individual shot segmentsare next analyzed 113 in order to derive camera operation parameters. InFIG. 1, the vector field 114 is representative of the pan of the camerawhich originally captured the video information which is being analyzed.Histogram 115 is used to detect the pan of the camera. Based on thederived operating parameters, moving objects 117 within the compressedvideo information 116 are detected and shape and trajectory features forsuch moving objects are extracted.

In the visualization module 120, the complied list of scene cuts and thederived camera zoom and pan information are used to extract key frames121 which represent each video shot. The key frames 121 are placed in ahierarchical arrangement 125 so that they may be readily browsed with ahierarchical video scene browser, such as the browser described in D.Zhong et al., “Clustering Methods for Video Browsing and Annotation,”Storage and Retrieval for Still Image and Video Databases IV,IS&T/SPIE's Electronic Images: Science & Tech. 96, Vol 2670 (1996). Acontent-based image query system 126 may then be used to index andretrieve key frames or video objects based on their visual features andspatial layout.

In the authoring and editing module 130, software tools are provided tonot only enable a user to cut and past arbitrary compressed videosegments to form new video segments, but also to add special effects tosuch video segments, such as dissolve, key, masking and motion effects.

The parsing module 110 is explained in further detail with reference tothe FIGS. 2 and 3. Although the preferred steps executed e.g. on acomputer 131 to locate scene cuts in a compressed bitstream are fullydisclosed in the above referenced article by Meng et al., the disclosureof which is incorporated by reference herein, that technique is nowdescribed with reference to FIG. 2.

An MPEG-1 or MPEG-2 compressed bitstream 201 that is received by theparsing module 110 is first subjected to parsing 210. Such a bitstreamrepresents an arrangement of N×N blocks of video information that arebroadly organized into macroblocks, where each macroblock is defined byfour blocks of luminance information and one block of chrominanceinformation, and further organized into slices which represent contigoussequences of macroblocks of video information in raster scan order. TheN×N blocks of video information are preferably 8×8 blocks. The bitstreamalso represents associated motion vector information and predictionerror difference information which are needed to reconstruct originalblocks of video information.

In the parsing stage 210, the bitstream, is parsed down to thefundamental block level by parsing techniques known to those skilled inthe art 211. The parsed blocks of video information are still incompressed format and are thus represented by Direct Cosign Trasform(“DCT”) coefficients which have been quantized, Zig-Zag run-lengthencoded and variable length coded, as those skilled in the art willappreciate.

In the inverse motion compensation stage 220, the parsed blocks of videoinformation which belong to P and B frames are then subjected to inversemotion compensation 220 by using the associated motion vectorinformation to locate reference blocks of video information andreconstruct the DCT coefficients of blocks of video information in the Band P frames. In this step 220, only the first (the “DC”) DCTcoefficients are used. Motion vectors associated with the B and P framesare counted 222 for each frame in the sequence.

In the Statistical Stage 230, three ratios, i.e., the number ofintra-coded macroblocks to the number of forward motion vectors, thenumber of backward motion vectors to the number of forward motionvectors, and the number of forward motion vectors to the number ofbackward motion vectors, are calculated 231 in order to detecting directscene cuts in P, B, and I frames, respectively. In this stage, the factthat most video shots of compressed MPEG video are formed by consecutiveP, I and B frames that have a high degree of temporal correlation istaken advantage of. For P and B frames, this correlation ischaracterized by the ratio of the number of backward motion vectors, orintracoded macroblocks, to the number of forward motion vectors. Forexample, when a direct scene cut occurs on a P-frame, most macroblockswill be intracoded (i.e., no interframe prediction).

Also in the statistical stage 230 the variance of the DCT DCcoefficients of Luminance in the I, P and B frames is determined 232. Asthose skilled in the art will appreciate.

Next, in the detection stage 240, the ratios calculated in 231 arecompared to local adaptive thresholds in order to detect the peak values241. For P frames, the ratio of the number of intra-coded macroblocks tothe number of forward motion vectors is examined. For B frames, theratio of the number of backward motion vectors to the number of forwardmotion vectors is examined.

Also in the detection stage 240, the variance of DCT DC coefficientscalculated for I and P frames in 232 is used in two ways. For I frames,this variance information is used together with the ratio of the numberof forward motion vectors to the number of backward motion vectorsdetermined in 231 in order to detect candidate scene changes 242. If theratio is above a predetermined threshold, the frame is marked ascontaining a suspected scene cut. If the variable information is muchdifferent than was the variance information for the immediatelypreceding I frame, a suspected scene cut is likewise detectable. Thevariance information is also used directly to detect dissolve regions,i.e. regions where one scene is fading out and a second is fading in 243by examining the parabolic curve of the variance information.

In the decision stage 250, duplicate detections of scene cuts areeliminated 251 before list of scenes 252 is determined. In addition, ifa suspected scene change occurs within a time threshold T rejection of apreviously detected scene change, no scene change is recorded.

With scene cuts detected, the bitstream is broken into scenes or “shots”which represent different sets of visual information that may be ofinterest to a user. The next block of our parsing module 110, cameraoperating estimation 113, examines the individual shots for recognizableinformation which is highly useful in indexing the different shots whichhave been parsed.

In particular, certain low level visual features such as camera zoom andpan, and the presence of moving visual objects are useful informationfor video indexing. Camera zoom and pan, which give the global motion ofthe shot being analyzed, can be estimated with a 6-parameter affinetransform model by using the actual motion vectors from the MPEGcompressed stream.

The motion vectors in MPEG are usually generated by block matching:finding a block in the reference frame so that the mean square error ofprediction is minimized. Although the motion vectors do not representthe true optical flow, it is still good in most cases to estimate thecamera parameters in sequences that do not contain large dark or uniformregions.

When the distance between the object or background and the camera islarge, a 6 parameter affine transform can be used to describe the globalmotion of the current frame:

$\begin{matrix}{\begin{bmatrix}u \\v\end{bmatrix} = {\begin{bmatrix}1 & x & y & 0 & 0 & 0 \\0 & 0 & 0 & 1 & x & y\end{bmatrix} \cdot \left\lbrack \begin{matrix}a_{1} & a_{2} & a_{3} & a_{4} & a_{5} & \left. a_{6} \right\rbrack^{T}\end{matrix} \right.}} & (1)\end{matrix}$

where (x, y) is the coordinate of a macroblock in the current frame, [uv]^(T) is the motion vector associated with that macroblock, and [a₁ a₂a₃ a₄ a₅ a₆]^(T) is the affine transform vector. In order to simplifythe mathematics, the following variables can be defined:

U for [u, v]^(T)

X for

$\begin{bmatrix}1 & x & y & 0 & 0 & 0 \\0 & 0 & 0 & 1 & x & y\end{bmatrix},$

and

for [a₁ a₂ a₃ a₄ a₅ a₆]^(T).

Given the motion vector for each macroblock, a global parameter can bedetermined using Least Squares (“LS”) estimation by finding a set ofparameters to

minimize the error between the motion vectors estimated in (1) and theactual motion vectors obtained from the MPEG stream:

$\begin{matrix}{{S\left( \overset{\rightharpoonup}{a} \right)} = {\sum\limits_{X}{\sum\limits_{Y}\left\lbrack {\left( {{\hat{u}}_{xy} - u_{xy}} \right)^{2} + \left( {{\hat{v}}_{xy} - v_{xy}} \right)^{2}} \right\rbrack}}} & (2)\end{matrix}$

In equation (2), [û, {circumflex over (v)}]^(T) is the estimated motionvector.

is then solved for by setting the first derivative of S(

) to 0 to get:

$\begin{matrix}{{{\begin{bmatrix}N & A & B \\A & C & E \\B & E & D\end{bmatrix} \cdot \begin{bmatrix}a_{1} \\a_{2} \\a_{3}\end{bmatrix}} = {{\begin{bmatrix}U_{1} \\U_{2} \\U_{3}\end{bmatrix}\mspace{14mu} {{{and}\mspace{14mu}\begin{bmatrix}N & A & B \\A & C & E \\B & E & D\end{bmatrix}} \cdot \begin{bmatrix}a_{4} \\a_{5} \\a_{6}\end{bmatrix}}} = \begin{bmatrix}V_{1} \\V_{2} \\V_{3}\end{bmatrix}}}{{where},{N = {\sum\limits_{x}{\sum\limits_{y}1}}},{A = {\sum\limits_{x}{\sum\limits_{y}x}}},{B = {\sum\limits_{x}{\sum\limits_{y}y}}},{C = {\sum\limits_{x}{\sum\limits_{y}x^{2}}}},{D = {\sum\limits_{x}{\sum\limits_{y}y^{2}}}},{E = {\sum\limits_{x}{\sum\limits_{y}{x \cdot y}}}},{U_{1} = {\sum\limits_{x}{\sum\limits_{y}u_{xy}}}},{U_{2} = {\sum\limits_{x}{\sum\limits_{y}{u_{xy} \cdot x}}}},{U_{3} = {\sum\limits_{x}{\sum\limits_{y}{u_{xy} \cdot y}}}},{V_{1} = {\sum\limits_{x}{\sum\limits_{y}v_{xy}}}},{V_{2} = {\sum\limits_{x}{\sum\limits_{y}{v_{xy} \cdot x}}}},{V_{3} = {\sum\limits_{x}{\sum\limits_{y}{v_{xy} \cdot {y.}}}}}}} & (3)\end{matrix}$

All summations are computed over all valid macroblocks whose motionvectors survive after the nonlinear noise reduction process. (Forexample, median filter.) This process is used to eliminate obvious noisein the motion vector field. After the first LS estimation, motionvectors that have large distance from the estimated ones are filteredout before a second LS estimation. The estimation process is preferablyiterated several times to refine the accuracy.

Referring to FIG. 3, the foregoing process to determine a global motionparameter is illustrated in flow diagram 300. In step 310, each motionvector associated with the B and P frames contained in the shot aredecoded. The intermediate variables N, A, B, C, D, and E areinteractively calculated using the x and y coordinates for eachmacroblock in the current frame of video being analyzed 320. Next,intermediate transform parameters u₁, u₂, u₃, v₁, v₂, v₃ are calculatedusing the decoded motion vectors and the x and y coordinates for suchmacroblocks 330. Finally, the vector

is solved for by solving for the matrix inverse operation 340.

After the global camera parameters

are found, object motion can be extracted by applying global motioncompensation. Referring to FIG. 4, if an object located at (x, y) in thecurrent frame moves from (x₀, y₀) to (x₁, y₁) in the reference framewith motion vector U, then U+M=X·

. Thus, the motion of the local object can be recovered from itsassociated motion vectors as follows:

M=X·

−U  (4)

This is referred to as the global motion compensation (“GMC”). Withreference to FIG. 5, a moving ballerina is shown against a largelystagnant background. In FIG. 5( a), the original motion vectors for anexemplary frame of compressed video is illustrated. In FIG. 5( b), thevector map for the exemplary frame is shown after GMC has been applied.As illustrated in FIG. 5( b), for motion vectors of the background, GMCgives mostly zero values, while for motion vectors of the foregroundmoving objects, GMC reveals the local motion of objects.

The moving objects themselves can be detected by comparing the magnitudeof the local motion to a predetermined threshold value and by performingsimple morphological operations to delete small false objects and tofill noisy spots. For example, in FIG. 5( c) there is shown an extractedmoving object.

The DCT coefficients of the moving object are extracted from thecompressed video information to provide for later querying of theobject. Extraction of the DCT coefficient is done by the DCT domainmotion compensation algorithm, as disclosed in U.S. Pat. No. 5,408,274to Chang et al., the disclosure of which is incorporated by referenceherein. The outermost points of the extracted object are used to form abounding box, as illustrated in FIG. 5( d).

Again referring to FIG. 1, in the visualization module 120, the locationand size of each extracted bounding box is saved in a database 126 forlater browsing and indexing by a user. Likewise, visual features ofextracted objects, such as color, textures, and shape, are used toprovide content-based visual querying 127 of these and associated videoscenes.

The extraction of moving video objects from a compressed bitstream willnow be described with reference to the flow diagram 600 of FIG. 6.First, GMC is applied to the present frame of video 610. Second, eachglobally motion compensated motion vector is compared to a predeterminedthreshold value 620 in order to eliminate non-moving parts of the framefrom further consideration. Next, for blocks of information which stillhave non-zero motion vectors, the number of contiguous blocks arecounted 630. For each contiguous area, the number of associated blocksare compared to a predetermined minimum threshold value 640 in order toeliminate false small objects from being detected. Finally, the borderpoints for the remaining objects are identified and saved in a database,e.g. database 126, for later use 650, together with corresponding DCTcoefficients for all blocks within the border. In this way, theimportant moving video object can be extracted and indexed for laterviewing by a user.

With reference to FIG. 7, useful techniques for directly editingcompressed digital video will now be described. In general, editing ofcompressed video is directed to permitting a user to cut a first segmentof video 710 from a first video sequence 720 and a second segment ofvideo 730 from a second sequence of video 740 to form a new bitstream ofvideo information 750. Such techniques have been described in the art,including in the article by the present inventors, J. Meng et al.,“Tools for Compressed-Domain Vide Indexing and Editing,” SPIE Conf. onStorage and Retrieval for Image and Video Database, Vol. 2670 (1996),the disclosure of which is incorporated by reference herein.

In addition to the basic editing function “cut and paste”, several moreadvanced visual effects can be created in the compressed domain. For Iframes, the basic compression component is the Discrete Cosine Transform(DCT), which can be written in the following form:

F(u,v)=DCT(f(x,y))  (5)

Given the DCT, linear operations such as intensity addition and scalingcan performed in accordance with equations (6) and (7):

DCT(f ₁(x,y)+f ₂(x,y))=F ₁(u,v)+F ₂(u,v)  (6)

DCT(α·f(x,y))=α·F(u,v)  (7)

Algorithms for other operations such as spatial scaling, translation,and filtering in DCT domain are well known in the art. Generally, theDCT of the output video Y can be obtained by linear matrix operations ofthe input DCT, P_(i), as follows:

$\begin{matrix}{Y = {\sum\limits_{i}{W_{i} \cdot P_{i} \cdot H_{i}}}} & (8)\end{matrix}$

where H_(i) and W_(i) are special filter coefficient matrices in the DCTdomain. For motion compensated B and P frames, the compressed-domainmanipulation functions can be implemented in two ways. First,transform-domain techniques can be used to convert B and P frames tointraframe DCT coefficients, on which the above techniques can bereadily applied. An alternative approach is to keep the B or P structure(i.e., the DCT coefficients of residual errors and motion vectors) anddevelop algorithms directly utilizing these data. Several advancedvisual effects which can be created in the compressed domain—dissolve,masking, freeze frame, variable speed, and strobe motion—are nowparticularly described.

One of the most important tools used in film editing is dissolve. Asillustrated in FIG. 8( a), dissolve refers to the technique where anoutgoing video scene 801 is faded out while an incoming video scene 802is faded in. In order to perform a dissolve on two different scenes ofvideo, the actual DCT coefficients for each block of compressed video inthe last frame of the outgoing video scene, and the DCT coefficients foreach block of compressed video in the first frame of the incoming videoscend, must be extracted. One technique for extracting such DCTcoefficients is described in the above-mentioned Chang et al. patent.The Chang et al. patent describes a technique which uses DCT domaininverse motion compensation to obtain the DCT coefficients for allblocks of video information which make up the needed frames of video.

When there is no or little motion in the two videos scenes, the dissolveeffects can be approximated by the linear combination of the two videoscenes F₁ and F₂:

F(u,v,t)=α(t)·F ₁(u,v,t ₁)+(1−α(t))·F ₂(u,v,t ₂)  (9)

v and v are coordinates within a frame, t is the frame index value whichmay range from 1 to N, n being the total number of frames in thedissolved region, and α(t) is a weighing function that is variable from100% to 0%, F₁ is the composite of the derived DCT coefficients for allblocks which make up the last frame of the outgoing video scene and F₂is the composite of the derived DCT coefficients for all blocks whichmake up the first frame of the incoming video scene. The resultingeffect is a dissolve transition from a particular frozen frame of theoutgoing video scene to another frozen frame of the incoming videoscene.

Thus, to smooth out the dissolve where one or both of the dissolvingscenes contain moving video objects, it is desirable to re-encodeseveral dissolving frames over a transitional period t₁-t₂. The readeris referred to the Appendix of this patent document for our preferredsource code for implementing a dissolve.

Referring to FIG. 8( b), a method for dissolving an incoming scene andan outgoing scene of video is now described by way of a flowchart 800.In step 810, DCT domain motion compensation is used to obtain the DCTcoefficients for all blocks of video information which make up the lastframe of an outgoing video scene F₁. In step 820, DCT domain motioncompensation is used to obtain the DCT coefficients for all blocks ofvideo information which make up the first frame of an incoming videoscene F₂. In step 830, the initial value for the weighing function,α(t), is chosen. In step 840, equation 9 is applied to create a firstframe in the dissolve region. The value of t is then incremented until afinal value n is obtained 850. The process 830-850 is repeated to createall dissolve frames for the duration t=1 to n.

A second important tool used in film editing is masking. As illustratedin FIG. 9( a), the film effect of masking video refers to transformingan original video scene having e.g. a 4:3 aspect ratio to differentaspect ratios such as 1:1.66, 1:1.85, 1:2.35, or 16:9. Masking can alsobe used to crop part of the frame region to a different frame size. ForI frames, the DCT coefficients for blocks outside of the desired regionare set to 0, and the coefficients for blocks that lie on the maskingboundaries can recalculated using a simplified DCT cropping algorithm:

$\begin{matrix}{{{{DCT}(B)} = {{{DCT}(H)} \cdot {{DCT}(A)}}},{{{where}\mspace{14mu} H} = \begin{bmatrix}0 & 0 \\0 & I_{h}\end{bmatrix}}} & (10)\end{matrix}$

where A is an original block located on the boundary, B is the newmasked block, and I_(h) is the identity matrix with size h×h, as shownin FIG. 9 a.

For P and B frames, only macroblocks with motion vectors pointingoutside of the masking region need to be re-encoded. Macroblocks withmotion vectors pointing inside do not need any modification. Efficientalgorithms for reencoding macroblocks are well known in the art and aredescribed in, for example, the above-referenced Chang et al. patent.

Referring to FIG. 9( b), a method for performing masking is nowdescribed. In step 910, the frame to be masked is examined in order todetermine whether it is an I, P or B frame. If the frame is an I frame911, all DCT coefficients for all blocks within the frame are extracted920. Block_(n) is examined 930 to determine where in the frame it islocated. If the block is inside the mask region 931, the DCTcoefficients for the block are unchanged 941. If the block is outsidethe make region 932, the DCT coefficients for the block are set to zero942. If the block is on the boundary of the masked region 933, the DCTcropping algorithm, i.e., equation 10, is applied 943. The block index nis incremented 950 and the process repeated 930-950 until all blockshave been examined.

If, however, the frame to be masked is either a P or B frame 920, motionvectors associated with block_(n) are examined 960 to determine whetherthey point to blocks outside or on the mask region 970. If a motionvector points to blocks outside or on the mask region 972, themacroblock is reencoded 980. The block index n is incremented 990 andthe process repeated 960-990 until all blocks have been examined.

A third important tool used in film editing is the freeze effect. Sincethe freeze effect is usually longer than 1 second, simply repeatingduplicate frames is not desirable if interactive playback (e.g. randomsearch) is desired. Instead, the film effect of freeze frame requiresthe use of periodic I frames.

Thus, in order to ensure interactive playback capabilities, if thefrozen frame is a P or B frame, it should be converted to an I frame. Inorder to convert a P or B frame into an I frame, every block of videoinformation for the frame that has an associated motion vector isconverted into a pure array of DCT coefficients, without motion vectorinformation. The above-mentioned DCT domain motion compensationalgorithm described in the Chang et al. patent is advantageously used toeffect such conversion. The converted blocks can be referred to asintracoded blocks. Thereafter, the group of pictures which represent thefrozen frame are filled with duplicated P frames, with all macroblocksin the duplicated P frames being to Motion Compensation Not Coded (i.e.,0 motion vector, and 0 residue error). Duplicate P frames can easily becreated independently and inserted after I or P frames, as thosefamiliar with digital compression techniques will easily understand.

A method for generating a frozen frame of video information is nowdescribed with reference to flowchart 1000 in FIG. 10. In step 1010, auser-defined frame of compressed video information is selected for thefreeze-frame effect. In step 1020, the frame is examined in order todetermine whether it is an I, P or B frame. If the frame is not an Iframe 1021, it is converted into an I frame 1030. Finally, original 1022or converted 1030 I frames are then used as to create duplicate P frames1040. Periodical I frames can be inserted to increase interactivity, andto maintain a compatible bitrate.

A fourth important tool used in film editing is variable speed playback.Faster than normal playback, e.g., fast forward, is simply realized bydropping B, P, and then I frames according to the variable speeddesired.

With respect to slow motion, depending on the slow motion rate, thereare two possible approaches. With reference to FIG. 11, one approach isto simply insert duplicate B frames (B′) whenever an original B frameappears. Duplicate B frames are frames which copy all DCT coefficient,and associated motion vectors from previous B frame. There is nodecoding or re-encoding involved in this approach. However, thisapproach has a drawback since the I/P frame delay is increased by afactor equal to the inverse of the motion rate. For example, in theillustrative sequence of frames shown in FIG. 11, the reference frame I₀must be transmitted 4 frames earlier. This limitation makes thisapproach suitable for slow playback up to about ½ normal frame rate.This will require an increased decoder buffer size.

In approach 2, original NB frames are converted to I frames using theDCT domain techniques described above, and duplicated P frames areinserted between I frames, as in the case with freeze frame. Thisapproach reduces frame delay, although requiring extra DCT domainmanipulations.

Another interesting tool used in film editing is the special effect ofstrobe motion, which is simply a combination of freeze frame andvariable speed playback. Strobe motion is effected by dropping originalB and/or P frames and inserting duplicated P frames.

Referring to FIG. 12, a system for enabling distributed clients tobrowse and manipulate digital video by way of a distant server or hostcomputer is now described. A master server 1210 or a group ofdistributed servers is linked to distant servers 1220, 1230, 1240 andseveral clients 1250, 1260. From their client workstations, users areempowered to browse and edit compressed vide images in the mannerdescribed above.

Moving to FIG. 13, the master server 1210 acts as the content aggregator1310 to collect video content, and as a content analyzer 1320 to analyzethe visual content for efficient indexing 1330. Thus, the server 1210may accept the functions performed by the parsing module 110 describedabove. The server 1210 also provides an editing engine 1340 with basicediting tools, such as the tools described above, for rendering basicediting functions and various special effects. Distributed clients 1250,1260 access the video archives of the server through heterogeneousnetworks with interoperable interfaces, such as JAVA or MPEG MSDLmanipulation tools.

The video server 1210 can be linked with other distributed servers 1220,1230, 1240 that have video search engines which search for image andvideo over a network, e.g., the World Wide Web. Once a video file isfound on any other hosts or online content providers, it will bedownloaded and preprocessed by the video server 1210 to extract“keyframes,” i.e., frames 121 stored in the above describedvisualization module 120, and associated visual features such as cameramotion, moving objects, color, texture, and temporal features. TheUniversal Resource Location (“URL”) of the video and the extractedfeatures will be stored on the video server 1210. This client-servermodel empowers clients 1250, 1260 with much richer resources beyond thatprovided by the client's local system.

The client 1260 may open any video shot stored at the server 1210 andbrowse 1330 the keyframes hierarchically using any one of numerousdifferent viewing models, including sequential view, feature-based view,and story-based view. A sequential view arranges the video shotsaccording to the sequential time order; a feature-based view clustersvideo shots with similar visual features to the same group; a story viewpresents video shots according to the story structures in the semanticlevel. For example, for news applications, the story model may use theanchorperson shot as the criterion for automatic segmentation of videointo semantic-level stories. The client can also perform shot-levelediting or frame level editing 1390 using the above discussed editingtools. The clients may also submit their own videos which may beanalyzed and indexed by the server 1210.

To view or to further refine retrieved video shots, the client 1260 maychoose different levels of resolution for the video shots sent by theserver 1210. There are three broad levels of video rendering. At thefirst level, the client 1260 can render only straight cuts at lowresolution without any special effects. At the second level, the clientmay send information to the server defining low-resolution video withdesired special effects to be generated. Finally, when the client 1260no longer wishes to perform editing, the master server 1210 willgenerate a full-resolution video with all the effects requested by theclient from the highest quality source video which is located at eitherthe master server 1210 or the distributed remote content servers 1220,1230, 1240. In this way, the user is given the flexibility to tradeoffbetween quality and speed.

In order to obtain a reduced resolution video sequence, for each videosequence stored on a server, a low resolution icon stream video must beextracted from the full resolution video. The icon stream is an 8:1 downsampled version of the original video. For I frames, DCT DC coefficientsare obtained directly. For B and P frames, DCT domain inverse motioncompensation is applied to the DCT coefficients. Finally, the extractedcoefficients are converted into a smaller sized, e.g., 8:2:1, I frame asthose skilled in the art will appreciate.

The foregoing merely illustrates the principles of the invention.Various modifications and alterations to the described embodiments willbe apparent to those skilled in the art in view of the inventorsteachings herein. For example, while the above description was directedto indexing and editing digital vide which has been compressed inaccordance with the MPEG format, the foregoing is equally applicable toother compressed bitstreams, e.g., Motion JPEG, H.261 or H.263, or anyother algorithm based on transform coding and motion compensation. Itwill thus be appreciated that those skilled in the art will be able todevise numerous systems and methods which, although not explicitly shownor described herein, embody the principles of the invention and are thuswithin the spirit and scope of the invention.

APPENDIX    int do_dissolve(int duration)    {     int i, j, k, s, cc;    int blks;     double alpha;     short (*tmp[3])[64];     Id = &S1;    initbits( );     Id->end = roundtoP(ld->end);     do_cut(ld->begin,ld->end);     getDCTframe(ld->end, GETDCT);     /* copy the first DCT totmpDCT */     for (cc=0; cc<3; cc++) {      if (cc==0)       blks =(coded_picture_width*coded_picture_height)>>6;      else       blks =(chrom_width*chrom_height)>>6;      memcpy(tmpDCT[cc], refDCT[cc],blks * sizeof(short [64]));    }    ld = &S2;    initbits( );   ld->begin = roundtoP(ld->begin);    getDCTframe(ld->begin, GETDCT);   /* first prediction is based on tmpDCT */    for (cc=0; cc<3; cc++) {    if (cc==0)      blks =(coded_picture_width*coded_picture_height)>>6;     else      blks =(chrom_width*chrom_height)>>6;     memcpy(auxDCT[cc], tmpDCT[cc], blks *sizeof(short [64]));      }      for(s=1;s<=duration;s++) {      alpha=s/(double)duration;       for (i=0; i<3; i++) {       tmp[i] = oldrefDCT[i];        oldrefDCT[i] = auxDCT[i];       auxDCT[i] = tmp[i];        newDCT[i]= auxDCT[i];       }      for (cc=0; cc<3; cc++) {       if (cc==0)         blks =(coded_picture_width*coded_picture_height)>>6;        else         blks= (chrom _width*chrom_height)>>6;        for (j=0; j<blks; j++) {        for (k=0; k<64; k++)         newDCT[cc][j][k] = (short) CLAMP(-2048,(1-alpha)*tmpDCT[cc][j][k]         +alpha*refDCT[cc][j][k],2047);        }       }       /* userefoldDCT as reference frame for DPCM of P */       putpictDCT(0);     }    }

1-18. (canceled)
 19. A method for masking a region of a compressed frameof digital video information, comprising the steps of: a. determiningwhether said frame to be masked is intra-coded, predictive-coded orbi-directionally predictive-coded; b. if said frame is intra-coded: i.extracting DCT coefficients for all blocks within said frame; ii.examining block_(n) to determine where in said frame said block islocated; iii. setting said DCT coefficients for said block to zero ifsaid block is outside said mask region; iv. applying a DCT croppingalgorithm to said DCT coefficients if said block is on the boundary ofsaid mask region; and v. repeating steps (b)(ii)-(b)(iv) for each blockin said frame; c. If said frame is predictive-coded or bi-directionallypredictive-coded: i. examining motion vectors associated with block_(n)to determine whether they point to blocks outside or on said maskregion; ii. reencoding said block if a motion vector points to blocksoutside or on said mask region; and iii. repeating steps (c)(i)-(c)(ii)for all blocks in said frame.
 20. A method for generating a reducedspeed sequence of frames of video information from a sequence of framesof compressed video information, comprising the steps of: a. selecting aframe of compressed video information to be repeated; b. determiningwhether said frame to be repeated is intra-coded, predictive-coded orbi-directionally predictive-coded; c. converting said frame into anintra-coded frame if said frame is a predictive-coded orbi-directionally predictive-coded frame; d. creating duplicatepredictive-coded frames; and e. arranging said determined frame and saidduplicate predictive-coded frames into a sequence of compressed framesof video information.
 21. The method of claim 20, wherein said reducedspeed sequence of frames generates a frozen frame effect.
 22. The methodof claim 20, further comprising the step of converting said frame intoan intra-coded frame if said frame is a predictive-coded orbi-directionally predictive-coded frame.
 23. A system for editingcompressed video information over a distributed network, comprising: a.a client computer; b. a network link, coupled to said client computer,for permitting said client computer to search for and locate compressedvideo information on said distributed network; and c. means for editinga compressed bitstream of video information over said distributednetwork.
 24. The system of claim 21, wherein said editing means includesmeans for dissolving an incoming scene of video information whichincludes a sequence of fields or frame of compressed video informationand an outgoing scene of video information which includes a sequence offields or frame of compressed video information, said dissolving meanscomprising: a. outgoing motion compensation means for applying DCTdomain motion compensation to a last frame of said outgoing video sceneto obtain DCT coefficients for all blocks of video information whichmake up said last frame of said outgoing video scene; b. incoming motioncompensation means, for applying DCT domain motion compensation to afirst frame of said incoming video scene to obtain the DCT coefficientsfor all blocks of video information which make up said first frame ofsaid incoming video scene; and c. dissolve region creating means,coupled to said incoming motion compensation means and to said outgoingmotion compensation means, for creating a first frame in a dissolveregion from said DCT coefficients of said last outgoing frame and saidfirst incoming frame.
 25. The system of claim 21, wherein said editingmeans includes means for masking a region of a compressed frame ofdigital video information, said masking means comprising: a. means fordetermining whether said frame to be masked is intra-coded,predictive-coded or bi-directionally predictive-coded; b. means forprocessing intra-coded frames including: i. means for receiving DCTcoefficients and for extracting DCT coefficients for all blocks withinsaid frame; ii. means, coupled to said receiving means, for examiningblocks of compressed video information in said frame to determine wherein said frame said block is located; iii. means, coupled to saidexamining means, for setting said DCT coefficients for said block tozero if said block is outside said mask region; and iv. means, coupledto said setting means, for applying a DCT translation algorithm to saidDCT coefficients if said block is on the boundary of said mask region;and c. means for processing predictive-coded and bi-directionallypredictive-coded frames, including: i. means for receiving and examiningmotion vectors associated with blocks of video information to determinewhether they point to blocks outside or on said mask region; and ii.means, coupled to said receiving means, for reencoding said block if amotion vector points to blocks outside or on said mask region.
 26. Thesystem of claim 2, wherein said editing means includes means generatinga reduced speed sequence of frames of video information from a sequenceof frames of compressed video information, comprising: a. selectionmeans for selecting a frame of compressed video information to berepeated; b. computational means, coupled to said selection means, fordetermining whether said frame to be repeated is intra-coded,predictive-coded or bi-directionally predictive-coded; c. framegenerating means, coupled to said computational means and to saidconverting means, for creating duplicate predictive-coded frames; and d.frame arranging means, coupled to said frame generating means, forarranging said determined frame and said duplicate predictive-codedframes into a sequence of compressed frames of video information.
 27. Amethod for converting a full resolution compressed domain bitstream intoa reduced resolution compressed bitstream, comprising the steps of: a.examining a frame of compressed video information from said fullresolution bitstream; b. determining whether said examined frame isintra-coded, predictive-coded or bi-directionally predictive-coded; c.extracting DCT DC coefficients for said determined frame if saiddetermined frame is intra-coded d. applying DCT domain inverse motioncompensation to said frame to extract DCT DC coefficients if said frameis predictive-coded or bi-directionally predictive-coded; and e.converting said extracted DCT DC coefficients into DCT DC coefficientsfor a reduced size intra-coded frame of video.